Selective MCE training strategy in Mandarin speech recognition
نویسندگان
چکیده
The use of discriminative training methods in speech recognition is a promising approach. The minimum classification error (MCE) based discriminative methods have been extensively studied and successfully applied to speech recognition [1][2][3], speaker recognition [4], and utterance verification [5][6]. Our goal is to modify the embedded string model based MCE algorithm to train a large number of crosssyllable triphones in a large vocabulary recognition system. In this paper, selective strategy about MCE based discriminative training method, in particular for Mandarin speech syllable loop recognition, is introduced. Here, we use syllable loop recognition task to evaluate performance of the acoustic model of an established large vocabulary continuous speech recognition system. Since decoding errors only occur in parts of the whole decoded sentence, it is reasonable to adjust only the parameters of the "wrong models". As a result, we introduce the weighted MCE formulation, which can provide more effective convergence property and about 10% relative error rate reduction for a large training set. On the other hand, from our experiments, we observed that although the overall performance of recognition system is improved, some originally correct recognition results are misrecognized after discriminative training. To address this issue, we propose a divide and conquer strategy. The acoustic feature space is divided into two or more sub-spaces according to discriminative training procedure. Combining the above two methods, we got more than 14.5% error rate reduction in syllable loop recognition experiments.
منابع مشابه
Large-margin minimum classification error training: A theoretical risk minimization perspective
Large-margin discriminative training of hidden Markov models has received significant attention recently. A natural and interesting question is whether the existing discriminative training algorithms can be extended directly to embed the concept of margin. In this paper, we give this question an affirmative answer by showing that the sigmoid bias in the conventional minimum classification error...
متن کاملMandarin telephone speech recognition using MCE/GPD-based speaker cluster HMM
In this paper we successfully apply the MCE/GPD method to train speaker cluster HMM. The essential concept of our approach is to adjust all the parameters of the speaker cluster HMM simultaneously using each utterance of the whole training set. In other words, the parameters of each cluster-dependent HMM are no longer independently estimated by using only the training data of the speakers who b...
متن کاملStochastic vector mapping-based feature enhancement using prior-models and model adaptation for noisy speech recognition
This paper presents an approach to feature enhancement for noisy speech recognition. Three prior-models are introduced to characterize clean speech, noise and noisy speech, respectively. Sequential noise estimation is employed for prior-model construction based on noise-normalized stochastic vector mapping. Therefore, feature enhancement can work without stereo training data and manual tagging ...
متن کاملDiscriminative noise adaptive training approach for an environment migration
A combined strategy of noise-adaptive training (NAT) and discriminative-based adaptation is proposed for effective migration of speech recognition systems to other noisy environments. NAT is an effective approach for real-field applications, but does not satisfy the minimum classification error (MCE) criterion for the recognition process and adapts poorly to new environments. The proposed metho...
متن کاملAn MRNN-based method for continuous Mandarin speech recognition
A new MRNN-based method for continuous Mandarin speech recognition is proposed. The system uses five RNNs to accomplish many subtasks separately and then combine them to integrally solve the problem. They include two RNNs for the discriminations of the two sub-syllable groups of 100 RFD initials and 39 CI finals, two RNNs for the generations of dynamic weighting functions for sub-syllable’s int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001